Model Selection

Efficient vision-language model

# Efficient vision-language model

OmniGen2 is a powerful and efficient unified multimodal model composed of a 3B vision-language model and a 4B diffusion model, supporting visual understanding, text-to-image generation, instruction-guided image editing, and context generation.

Mobileclip B LT OpenCLIP

MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.

MobileVLM is a lightweight multi-modal vision-language model designed specifically for mobile devices, supporting efficient image understanding and text generation tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase